Week 04: Bottom-Up Parsing I & II

Predictive Parser: LL(1)

Concept
- Look at the next? tokens.
- No backtracking.
- Accept LL(K) grammars. (Left-to-Right, Leftmost derivation, k tokens)
- At each step, only one choice.
- To avoid ambiguousness, need to left-factor the grammar.
Parsing Table
- Leftmode Non-terminal x Next Input Token
- Use stack to record frontier of parse tree
Differnce from Recursive Descent
- For the leftmost non-terminal S
- Look at the next input token a
- Choose the production shown at [S,a]
Reject on reaching error state
Accept on (end of input && empty stack)

Def. (First Set)

$$First(X)=\{t\ |\ X\rightarrow^*t\alpha\}\cup\{\epsilon\ |\ X\rightarrow^*\epsilon\}$$

Algo

$First(X)=\{t\}, if\ t\ is\ a\ terminal.$
$\epsilon \in First(X), if \begin{cases} X\rightarrow \epsilon \\ X\rightarrow A_1A_2\dots A_n, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n \end{cases}$
$First(\alpha) \subseteq First(X), if\ X\rightarrow A_1A_2\dots A_n\alpha, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n$

Def. (Follow Set)

$$Follow(X)=\{t\ |\ S\rightarrow^*\beta Xt\delta\}$$

Algo

$\$ \in Follow(S)$
$First(\beta)-\{\epsilon\}\subseteq Follow(X), for\ each\ A\rightarrow\alpha X\beta$
$Follow(A) \subseteq Follow(X), for\ each A\rightarrow\alpha X\beta\ where\ \epsilon\in First(\beta)$

Construct a parsing table T for CFG G
For each production $A \rightarrow \alpha$ in G do:
- For each terminal $t \in First(\alpha)$, $T[A, t] = \alpha$
- If $\epsilon \in First(\alpha),$, for each $t \in Follow(A)$, $T[A,t] = \alpha$
- If $\epsilon \in First(\alpha)$ and $ \$ \in Follow(A)$, $T[A,\$] = \alpha$
If any entry is multiply defined, then G is not LL(1).

Concept
- Don't need left-factored grammars.
- Reduce the string to the start symbol. (Inverting production)
- A bottom-up parser traces a rightmost derivation in reverse. $$ \begin{align} & int*int + int \\ & \textbf{int*T} + int \\ & \textbf{T} + int \\ & T + \textbf{T} \\ & T + \textbf{E} \\ & \textbf{E} \end{align} $$
- Thm.
  - In some step, let string as $\alpha \beta \gamma$.
  - Assume the next reduction is by $X \rightarrow \beta$. ($\alpha \beta \gamma \rightarrow \alpha X \gamma$)
  - Then $\gamma$ is a string which contains only terminals.
- Split the string as $L_{Str} | R_{Str}$.
- $L_{Str}$ contains non-terminals and terminals. $R_{Str}$ contains only terminals.
Actions
- Shift: Move | one place to the right. Shift a terminal to the left string.
- Reduce: Apply an inverse production at the right of $L_{str}$. $$ for\ A \rightarrow xy, Cbxy|ijk \Rightarrow CbA|ijk $$
Implementation
- Use a stack to maintain $L_{Str}$.
If is's legal to shift or reduce, there is a shift-reduce conflict.
If is's legal to reduce by two productions, there is a reduce-reduce conflict.

Scenario
- Grammar
  - $E \rightarrow T+E|E$
  - $T \rightarrow int*T|int|(E)$
- At the steop $int|*int+int$
- If we reduce by ($T \rightarrow int$) given ($T | *int+int$), it will fail.
Target
- Want to reduce only if the result can still be reduced to the start symbol.
A handle is the rhs of the production that you can trace back in the parse tree.
- $\Rightarrow$ Handles are the children of some internal node of some AST.
Assume a rightmost derivation $$S \rightarrow^* \alpha X \omega \rightarrow \alpha\beta\omega$$
- Then $\beta$ is a handle of $\alpha \beta \omega$.
Thm.
- In shift-reduce parsing, handles appear only at the top of the stack, never inside.
Bottom-up parsing algorithms are based on recognizing handles

There are no known efficient algo to recognize handles. But there are good heuristics.
Viable Prefix: $\alpha_1\dots\alpha_n$ is a viable prefix if there is an $\omega$ s.t. $\alpha_1\dots\alpha_n\ |\ \omega$ is a state of a shift-reduce parser.
- A viable prefix is always the prefix of some handle.
- If we can maintain the stack's contents are viable prefixes all the time, no parsing error will occur.
Thm. For any grammar, the set of viable prefixes is a regular language.
Item: An item is a production with '.' somewhere on the rhs.
- Example. The items for the production rule $T \rightarrow (E)$ are
  - $T \rightarrow .(E)$
  - $T \rightarrow (.E)$
  - $T \rightarrow (E.)$
  - $T \rightarrow (E).$
- Example. The items for ($X \rightarrow \epsilon$) are $X \rightarrow .$
- Items are often called LR(0) items.
Target: To describe the states of the production rule.
- If we need to use some production rule $R$ to reduce, the top of the stack will be the left string split by '.' on some item of $R$.
- Example.
  - Input: $(int)$
  - Grammar
    - $E \rightarrow T+E\ |\ T$
    - $T \rightarrow int*T\ |\ int\ |\ (E)$
  - $(E$ is the left string split by '.' on the item $T \rightarrow (E.)$

Add a dummy production $S' \rightarrow S$ to $G$
The NFA states are the items of $G$
For item $E \rightarrow \alpha .X\beta$ add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^X (E \rightarrow \alpha X.\beta)$
For item $E \rightarrow \alpha .X\beta$ and production $X \rightarrow \gamma$, add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^\epsilon (X \rightarrow .\gamma)$
Every state is an accepting state
Start state is $S' \rightarrow .S$

Concept
- Item $X \rightarrow \beta .\gamma$ is valid for a viable prefix $\alpha\beta$ if $$S' \rightarrow^* \alpha X\omega \rightarrow \alpha\beta\gamma\omega$$ by a right-most derivation.
- An item $I$ is valid for a viable prefix $\alpha(\alpha_1\dots\alpha_n)$ if the DFA accepts $\alpha_1\dots\alpha_n$ and terminates on some state $s$ containing $I$.
- The items in $s$ describe what the top of the item stack might be after reading input $\alpha_1\dots\alpha_n$.

Assume
- Stack contains $\alpha_1\dots\alpha_k$
- Next input is $t$
- DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
Action
- Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$
- Shift if $s$ contains item $X \rightarrow \beta .t\omega$
Conflict
- Reduce/Reduce if any state contains two reduce rule.
- Shift/Reduce if any state has a reduced item and a shift item

Assume
- Stack contains $\alpha_1\dots\alpha_k$
- Next input is $t$
- DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
Action
- Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$ and $t \in Follow(X)$
- Shift if $s$ contains item $X \rightarrow \beta .t\omega$
If there are conflicts under these rules, the grammar is not SLR.
We can use precedence declarations to resolve conflicts. * is grater than +.